Automatic Classification of Scientific Records using the German Subject Heading Authority File (SWD)

نویسندگان

  • Christian Wartena
  • Maike Sommer
چکیده

The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matching Multi-lingual Subject Vocabularies

Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, unified access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to ov...

متن کامل

Authority Control of People and Organizations on the Semantic Web

Authors and documents with identical titles are common in the digital library environment. In order to manage identities correctly, authority control is used by library and information scientists for disambiguating and cross-referencing entity names. We argue that the benefits of traditional authority control can be enhanced by using techniques and technologies of the Semantic Web, leading to s...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

Library of Congress Classification as linked data

In 2009 and in 2011, the Library of Congress made two of its largestauthority files –Subject Headings and Names available as linked data via LC’slinked data service, id.loc.gov. Both are offered in MADS/RDF and SKOS. It isLC’s objective, in 2012, to publish another of its largest authority files as linkeddata: LC Classification. However, whereas the source records for Subject He...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012